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Abstract In this paper we propose a new inexact dual decomposition algorithm for solving 
separable convex optimization problems. This algorithm is a combination of three tech- 
niques: dual Lagrangian decomposition, smoothing and excessive gap. The algorithm re- 
quires only one primal step and two dual steps at each iteration and allows one to solve 
the subproblem of each component inexactly and in parallel. Moreover, the algorithmic pa- 
rameters are updated automatically without any tuning strategy as in augmented Lagrangian 
approaches. We analyze the convergence of the algorithm and estimate its O ( j) worst-case 
complexity. Numerical examples are implemented to verify the theoretical results. 

Keywords Smoothing technique • excessive gap ■ Lagrangian decomposition • inexact first 
order method ■ separable convex optimization ■ distributed and parallel algorithm 



1 Introduction 

Many practical optimization problems must be addressed within the framework of large- 
scale structured convex optimization and need to be solved in a parallel and distributed 
manner. Such problems may appear in many fields of science and engineering: e.g. graph 
theory, networks, transportation, distributed model predictive control, distributed estima- 
tion and multistage stochastic optimization, see e.g. fTTl 1 5 ITT7 , 18.29.32 37 38.40] and the 
references quoted therein. Solving large-scale optimization problems is still a challenge in 
many applications 1 6 1 due to the limitations of computational devices and computer systems. 
Recently, thanks to the development of parallel and distributed computer systems, many 
large-scale problems have been solved by using the framework of decomposition. However, 
methods and algorithms for solving this type of problems, which can be performed in a 
parallel or distributed manner, are still limited J2]|6]. 



Quoc Tran Dinh ■ Moritz Diehl are with Department of Electrical Engineering (ES AT-SCD) and Optimization 
in Engineering Center (OPTEC), KU Leuven, Kasteelpark Arenberg 10, B-3001 Heverlee, Belgium. 
E-mail: {quoc . trandinh, moritz . diehl}9esat .kuleuven.be 

Ion Necoara is with the Automation and Systems Engineering Department, University Politehnica Bucharest, 
060042 Bucharest, Romania; E-mail: ion.necoara@acse.pub.ro 

Quoc Tran Dinh is with VNU University of Science, Hanoi, Vietnam. Currently, he is working as a postdoc- 
toral researcher at Laboratory for Information and Inference Systems, EPFL, 1015-Lausanne, Switzerland. 



2 



Quoc Tran-Dinh et al. 



In this paper we develop a new optimization algorithm to solve the following struc- 
tured convex optimization problem with a separable objective function and coupling linear 
constraints: 



where, for every i £ {1,2, . . . ,M}, : R"' — s- M is convex (not necessarily strictly convex) 
and possibly nonsmooth functions, X, G R"< is nonempty, closed and convex sets, A,- 6 M mx "'' 

and 6 K m , and «i + m H + «m = n. Here, x,- 6 X/ is referred to as a local convex 

constraint and the final constraint is called coupling linear constraint. 

In the literature, several approaches based on decomposition techniques have been pro- 
posed to solve problem (0. In order to observe the differences between those methods and 
our approach in this paper, we briefly classify some of these that we found most related. 
The first class of algorithms is based on Lagrangian relaxation and subgradient methods of 
multipliers [2 8 , 24|. It has been observed that subgradient methods are usually slow and 
numerically sensitive to the choice of step sizes in practice 1251 . Moreover, the convergence 
rate of these methods is in general 0(1 /\fk), where k is the iteration counter. The second 
approach relies on augmented Lagrangian functions, see e.g. I13U31I . Many variants were 
proposed and tried to process the inseparability of the crossproduct terms in the augmented 
Lagrangian function in different ways. Besides this approach, the authors in 1141 consid- 
ered the dual decomposition based on Fenchel's duality theory. Another research direction 
is based on alternating direction methods which were studied, for example, in Il3lllllll2l[l9l . 
Alternatively, proximal point-type methods were extended to the decomposition framework, 
see, e.g. [4,36]. Other researchers employed interior point methods in the framework of de- 
composition such as [17 21 23 34 401. Furthermore, the mean value cross decomposition 
in 1161 . the partial inverse method in [33 1 and the accelerated gradient method of multipliers 
in 1221 were also proposed to solve problem ([T}. We note that decomposition and splitting 
methods are very well developed in convex optimization, especially in generalized equations 
and variational inequalities, see e.g. I5ll9l l281 . Recently, we have proposed a new decompo- 
sition method to solve problem <|TJ in 1 35 1 based on two primal steps and one dual step. It is 
proved that the convergence rate of the algorithm is 0(1 /k) which is much better than the 
subgradient-type methods of multipliers [2] but its computational complexity per iteration 
is higher that of these classical methods. Moreover, the algorithm uses an automatic strategy 
to update the parameters which improves the numerical efficiency in practice. 

In this paper, we propose a new inexact decomposition algorithm for solving l[T} which 
employs smoothing techniques 1 10 1 and excessive gap condition [26]. 

Contribution. The contribution of the paper is as follows: 

1. We propose a new decomposition algorithm based on inexact dual gradients. This algo- 
rithm requires only one primal step and two dual steps at each iteration and allows one 
to solve the subproblem of each component inexactly and in parallel. Moreover, all the 
algorithmic parameters are updated automatically without using any tuning strategy. 

2. We prove the convergence of the proposed algorithm and show that the convergence rate 
is O (t) , where k is the iteration counter. Due to the automatic update of the algorithmic 
parameters and the low computational complexity per iteration, the proposed algorithm 
performs better than some related existing decomposition algorithms from the literature 
in terms of computational time. 




s.t. xi eXi (i = 1,...,M), 



(1) 



M 
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3. An extension to a switching strategy is also presented. This algorithm updates simulta- 
neously two smoothness parameters at each iteration and makes use of the inexactness 
of the gradients of the smoothed dual function. 

Let us emphasize the following points of the contribution. The first algorithm proposed 
in this paper consists of two dual steps and one primal step per iteration. This requires solv- 
ing the primal subproblems in parallel only once but needs one more dual step. Because the 
dual step corresponds only to a simple matrix-vector multiplication, the computational cost 
of the proposed algorithm is significantly reduced compared to some existing decomposition 
methods in the literature. Moreover, since solving the primal subproblems exactly is only 
conceptual (except existing a closed form solution), we propose an inexact algorithm which 
allows one to solve these problems up to a given accuracy. The accuracies of solving the 
primal subproblems are adaptively chosen such that the convergence of the whole algorithm 
is preserved. The parameters in the algorithm are updated automatically based on an analy- 
sis of the iteration scheme. This is different from augmented Lagrangian approaches Bi ll 11 
where we need to find an appropriate way to tune the penalty parameter in each practical 
situation. 

In the switching variant, apart from the inexactness, this algorithm allows one to update 
simultaneously both smoothness parameters at each iteration. The advantage of this algo- 
rithm compared to the first one is that it takes into account the convergence behavior of the 
primal and dual steps which accelerates the convergence of the algorithm in some practical 
situations. Since both algorithms are primal-dual methods, we not only obtain an approxi- 
mate solution of the dual problem but also an approximate solution of the original problem 
|Q~|| without any auxiliary computation. 

Paper outline. The rest of this paper is organized as follows. In the next section, we briefly 
describe the Lagrangian dual decomposition method for separable convex optimization. Sec- 
tion [3] mainly presents the smoothing technique via prox-functions as well as the inexact 
excessive gap condition. Section [4] builds a new inexact algorithm called inexact decompo- 
sition algorithm with two dual steps (Algorithm [T}. The convergence rate of this algorithm 
is established. Section [5] presents an inexact switching variant of Algorithm [T] proposed in 
Section [4] Numerical examples are presented in Section [6] to examine the performance of 
the proposed algorithms and a numerical comparison is made. In order to make the paper 
more compact, we move some the technical proofs to Appendix lAl 

Notation. Throughout the paper, we shall consider the Euclidean space W endowed with 
an inner product x T y for x,y € W and the norm \\x\\ := \/x T x. For a given matrix A 6 W nx ", 
the spectral norm ||A|| is used in the paper. The notation x= (jci, . . . ,xm) represents a column 
vector in R", where xi is a subvector in R"' and i = 1, . ..,M. We denote by R + and R ++ 
the sets of nonnegative and positive real numbers, respectively. We also use df for the 
subdifferential of a convex function /. For a given convex set X in R", we denote ri(X) the 
relative interior of X, see, e.g. F501 . 



2 Lagrangian dual decomposition 

In this section, we briefly describe the Lagrangian dual decomposition technique in convex 
optimization, see, e.g. (2). Letx := (jq, . . . ,xm) be a vector and A := [Ai, . . . ,Am\ be a matrix 
formed from M components x-, and A;, respectively. Let b := Y!\L\ °i ar, d X:=X\ x ■ ■ ■ x Xm- 
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The Lagrange function associated with the coupling constraint Ax — b = is defined by 

M 

Sf(x,y) :=$(x)+y T (Ax-b) = £ [<j) i (x i )+y T (A i x i -b i )] , 

where y is the Lagrange multiplier associated with Ax — b = 0. The dual problem of (0 is 
written as 

g* := max g(y), (2) 

where g(-) is the dual function defined by 

g(y) := min j£f = min{(j){x)+y T (Ax-b)}. (3) 

Note that the dual function g can be computed in parallel for each component X; as 

M 

giy) :=Fgi(y), where g ! (j):=rrun{^(xj)+/(A i Xi-fci)} ; i=l,...,Af. (4) 
. =1 x,eXi 

We denote a solution of the minimization problem in ©. Consequently, x*(y) := 

(x*(y), . . .,Xm (y)) is a solution of <[3j. It is well-known that g is concave and the dual problem 
is convex but nondifferentiable in general. 

Throughout the paper, we assume that the following assumptions hold 1311 . 

Assumption A. 2.1 The solution setX* of (|T]| is nonempty and either X is polyhedral or the 
Slater constraint qualification condition for problem 10 holds, i.e. 

{x 6l" | Ax- ft = 0} n ii(X) 0. (5) 

For each i 6 {1,2,..., M}, (j), is proper, lower semicontinuous and convex in W'. 

If X is convex and bounded then X* is also convex and bounded. Note that the objective 
function (j) is not necessarily smooth. For example, (j)(x) := ||jc||i = £" =1 |jC(,-)|, which is 
nonsmooth and separable, can be handled in our framework. Under Assumption A J2. II the 
solution set Y* of the dual problem (|2} is nonempty and bounded. Moreover, strong duality 
condition holds, i.e. for all (x*,y*) e X* x Y* we have (j)* = (j)(x*) = g(y*) = g*. If strong 
duality holds then we can refer to g* or (j)* as the primal-dual optimal value. 



3 Smoothing via prox-functions 

Since the dual function g is in general nonsmooth, one can apply smoothing techniques 
to approximate g up to a desired accuracy. In this section, we propose to use a smoothing 
technique via proximity functions proposed in [ 10 1. 

3.1. Proximity functions. Let C be a nonempty, closed and convex set in R". We consider 
a nonnegative, continuous and strongly convex function pc '■ C —> R + with a convexity pa- 
rameter Up > 0. As usual, we call pc a proximity function (prox-function) associated with 
the convex set C. Let 

p* c :=min p c (x) and D c := supp c (x). (6) 

•tec xeC 

Since pc is strongly convex, there exists a unique point / eC such that p* c = pc{xF). 
The point x c is called the proximity center of C w.r.t. pc- Moreover, if C is bounded then 
< p* c < Dc < +°°. Without loss of generality, we can assume that p* c > 0. Otherwise, we 
can shift this function as p~c( x ) Pc{ x ) + r o, where ro + p* c > 0. 
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Remark 3. 1 We note that the simplest prox-function is the quadratic form pc(x) : = -f- 1 \x — 
x^W 2 + r, where r > 0, O p > and feC are given. If the set C has a specific structure 
then one can choose an appropriate prox-function that captures better the structure of C than 
the quadratic prox-function. For example, if C is a standard simplex, one can choose the 
entropy prox-function as mentioned in II 1 01 - If C has no specific structure, then we can use 
the quadratic prox-function given above. Consequently, the convex problem generated using 
quadratic prox-functions reduces in some cases to a simple optimization problem, so that its 
solution can be computed numerically very efficient. 

3.2. Smoothed approximations. In order to build smoothed approximations of the objective 
function (j) and the dual function g in the framework of the primal-dual smoothing technique 
proposed in [26 1, we make the following assumption. 

Assumption A. 3.1 Each feasible setXj admits a prox-function px t with a convexity param- 
eter (7, > and the proximity center x^. Further, we assume 

< Px: '■= min/? x .(x,) < D Xj := sup p Xi {xi) < +°°, i = l,...,M. 

xeX < A-,-eXj 

If Xi is bounded for i = 1, . . . ,M, then Assumption l3.1l is satisfied. If X,- is unbounded, then 
we can assume that our sample points generated by the proposed algorithms are bounded. 
In this case, we can restrict the feasible set of problem ([TJ onlflC, where C is a given 
compact set which contains the sample points and the desired solutions of (TJ. 
We denote by 

MM M 

Px(x) ■= X>x,(-*i), Px ■= £/4; > °» andD x : = H D x, < +°°- ( 7 ) 

;=i ;=i i=i 

Since 0, is not necessarily strictly convex, the function g, defined by (O may not be differ- 
entiable. We consider the following function 

M 

g(y;pi) :=Ygi(y;pi), where gi (y;pi) := min {fcfo) +y T (A iXi - b t ) +p lPXl ,(*<)} , (8) 
i=i Xi&i 

for i = 1 , . . . , M and /3i > is a given smoothness parameter. We denote x* (y; jSi ) the unique 
solution of (SJ, i.e. 

x*(y;j6i) :=argpim{(l> i (x i )+y T {A i x i -b i )+l3ipx l {x i )}, i=\,...,M, (9) 

XjEX/ 

and x*(y; jSi) := (x* { (y;j6i), . . . ,x* M (y; jSi)). We call each minimization problem in l[8} a pri- 
mal subproblem. Note that we can use different smoothness parameters jSf in <[Sj for each 
i € {1, . . . ,M}. First, we recall the following properties of g(-;jSi), see 1101 . 

Lemma 3.1 For any /3i > 0, the function g(-;/3i) defined by <[Sj> is concave and differen- 
tiable. The gradient of g(-; jSi ) is given by V y g(y; /3i ) := Ax* (y; jSi ) — b which is Lipschitz 
continuous with a Lipschitz constant 

1 M II4-II 2 

L * (j8l):= J_£MiL. (io) 

PIS °i 

Moreover, we have the following estimates: 



g(y;Pi)-piDx<g(y)<g(y,pi), 



(ID 
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and 

g(yiPi)+Vy8&Pif(y-y)-!^\\y-n 2 <g&Pi),VyJ^ m - (12) 

Next, we consider the variation of the function g(y; ■) w.r.t. the parameter J3j. 

Lemma 3.2 Let us fix y € W l . The function g(y;-) defined by ([8]l is well-defined, nonde- 
creasing, concave and differentiable in R++. Moreover, the following inequality holds: 

g(y;ft) < g(y;ft) + (ft -ft)M**()>;ft)), pi, pi e R++, (13) 

where x* (y; /3i ) is defined by ([9]l- 

Proof Since g = g; and = L/^i PX;, it is sufficient to prove the inequality ( 113b for 
g;(y;-), with i = l,...,M. Let us fix y e R m and i e {l,...,M}. We define tyi{x;P\) := 
<f>i(x) +y T (AjXi — bi) + Pipxj(xi) a function of two joint variables x-, and P\. Since (j>i(-;-) is 
strongly convex w.r.t. x/ and linear w.r.t. j6i, g,-(y;j3i) := mui^x,-;/)]) is well-defined and 

concave w.r.t. jSi . Moreover, it is differentiable w.r.t. jSi and g(y; jSi ) = px, (x* (y; jSi )) > 0, 
where x*(y;Pi) is defined in Hence, g;(;y;-) is nonincreasing. By using the concavity of 
giiy;-) we have 

gi(j; ft ) < ft (y; ft ) + (ft - ft ) v ft «Cv; ft ) = «Ck ft ) + (ft - ft K (** to ft )) • 

By summing up the last inequality from i = 1 to M and then using l|7]l we obtain d 1 3 b - □ 

Finally, we consider a smooth approximation to 0. Let py(y) := jlbH 2 be a prox- 
function defined in W" with a convexity parameter a PY = 1 > 0. It is obvious that the prox- 
imity center of py is y c := 0'" € ffi m . We define the following function on X: 

<K*;ft) := max {(A*-fc) r y- f ||y|| 2 j , (14) 
veR'" ( 2 J 

where ft > is the second smoothness parameter. We denote by y*(x;pi) the solution of 
d 14b . From d 1 4b . we see that \jf(x; Pi) and y*(x; Pi) can be computed explicitly as 

v ( X ;Pi):=-±-\\Ax-b\\ 2 and y*(x;ft) := ±-{Ax-b). (15) 

Z/32 P2 

It clear that \jf(x; pi) > for all Now, we define the function f(x; Pi) as 

f(x;p 2 ):=$(x) + ¥ (x;pi). (16) 

Then, /(jt;ft) is exactly a quadratic penalty function of (0. The following lemma shows 
that /(■; Pi) is an approximation of (j). 

Lemma 3.3 The function \jf defined by ( 114b satisfies the following estimate: 

x l /(x;P 2 )< ¥ (x;Pi)+VM^P2) T (x-x) + Y J ^P 1 \\xi-X i \\\ Vx,ieX, (17) 

where Lf (Pi) := MMdL. Moreover, the function f defined by d 16b satisfies 

f(x-Pi) - ^-\\Ax-b\\ 2 = f(x;p 2 ) - ¥ (x;p 2 ) = «H*) < f{x- Pi), Vx e X. (18) 
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Proof By the definition of 1//, we have \ff(x\p\) — y{x;p\) — VxY(%',lh.) T (x— x) = ^ ||A (x— 
x)\\ 2 . Thus dl7l > follows from this equality by applying some elementary inequalities. The 
bounds d 1 8 b follow directly from the definition (I16t of /. □ 

3.3. Inexact solutions of the primal subproblem. Regarding the primal subproblem (|8}, if 
the objective function 0, has a specific form, e.g. univariate functions, then we can solve this 
problem analytically (exactly) to obtain a closed form solution. A simple example of such 
function is 0,-(jc,-) = |jq|. However, in most practical problems, solving the primal subproblem 
<[8j> exactly is only conceptual. In practice, we only solve this problem up to a given accuracy. 
In other words, for each i 6 {1,2,..., M} , the solution x* (y; J3i) in © is approximated by 

x*( y;j Si) :« argmin {>;(*;) +y T (Afc i -bi) + p l px } (x i )}, (19) 
in the sense of the following definition. 

Definition 3.1 We say that the point X- (y; jSi ) approximates x* (y; f5\ ) defined by up to a 
given accuracy e, > if: 

a) it is feasible to X;, i.e. x*(y; J3i) e X t \ 

b) Jc* (>■; jSi ) satisfies the condition: 

O^htf&M^M-hitf&p^M^^e?, (20) 

where /i,(x ; ;y,jSi) := <j>i(xi) +y r (A,x / - fe,-) + Pip Xj (xi). 

In practice, for a given accuracy > 0, we can check whether the conditions of Defini- 
tion l3.1l are satisfied by applying classical convex optimization algorithms, e.g. (sub)gradient 
or interior-point algorithms [25]. 

Since fej(-;y,j3i) is strongly convex with a convexity parameter jSiCT/ > 0, we have 

^ II^C^A) -^(^A)!! 2 < ak^CkA);^^!) -^(^(^A);^^) < ^e?, (21) 

where ;y, /3i ) is defined as in Definition l3.ll Consequently, we have: \\x* (y; /3i ) — x\ (y; jSi ) | < 
Ei for i = 1 , . . . ,M. Let x* (y; J3i) := (x\ (y; J3i) , . . . ,x* M (y; ft)) and 

V y g{y^ l ):=Ax*(y;^)-b. (22) 

The quantity V v g(-;j8i) can be referred to as an approximation of the gradient V v g(-;/3i) 
defined in Lemma [3~T1 If we denote by £ := (£1, . . . , £m) T the vector of accuracy levels then 
we can easily estimate 

\\%g(y;pi)-Vyg&Pi)\\ = \\A(x*(y-,li l )-x*(y-,li l ))\\ < \\A\\\\e\\. (23) 

3.4. Inexact excessive gap condition. Since problem i[T} is convex, under Assumption 
A I2. ll strong duality holds. The aim is to generate a primal-dual sequence {(Jc*, y k )}k>o such 
that for a sufficiently large k the point x k is approximately feasible to (0, i.e. \\Ax — b\\ < £ p , 
and the primal-dual gap satisfies (j> (x k ) — g(y k ) | < Ej for given tolerances > and e p > 0. 

The algorithm designed below will employ the approximate functions d~S~b— d 1 <5b to solve 
the primal-dual problems ([T}-(|2}- First, we modify the excessive gap condition introduced 
by Nesterov in |26| to the inexact case in the following definition. 
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Definition 3.2 A point (x,y) 6 X x W satisfies the inexact excessive gap (8-excessive gap) 
condition w.r.t. jSi > and ft > and a given accuracy 8 > if 

ffcpi)<g&Pi) + 8. (24) 

If 8 = then 1 124b reduces to the exact excessive gap condition considered in [26|. 

The following lemma provides an upper bound estimate for the primal-dual gap and the 
feasibility gap of problem (0. 

Lemma 3.4 Suppose that (x,y) 6 X x M!" satisfies the 8-excessive gap condition ( 124b . Then 
for any y* 6 Y*, we have 



^(x):=\\Ax-b\\ < J S 2 [||/|| + (||/||2 + ^Z) X + M) 1/2 ], (25) 
id — . / — \ 2 

-\\fW(x) < HX)-g(y) < 8+frDx-^- < 8 + fcDx. (26) 



Proof From the estimates i ll lb and d 1 8b we have 

<Hd-g(y) < f(x;h)-g(y;Pi)+hD x - -^\\Ax-b\\ 2 . (27) 

2jt>2 

Then, by using j24b . the last inequality implies the right-hand side of ( 126b . Next, for a given 
y* e Y* we have g(y) < max y g(y) = g(y*) = min A - eX {<j>(x) + {y*) T {Ax- b)} < <j>(x) + 
(y*) T (Ax-b) < <p(x) + ||y*||||Ax-Z?|j. Thus we obtain the left-hand side of Finally, 
the estimate d25b follows from l !26b after a few simple calculations. □ 

Let us define Ry := max,.*^* ||y*|| the diameter of Y*. Since Y* is bounded, we have 
< Ry* < +°°. The estimates l !25b and d26b can be simplified as 



&(x) < 2p2Ry*+ V2(^ifcDx+«fc) and -R Y *&{x)<<$>(x)-g(y) < frD x +8. (28) 



4 Inexact decomposition algorithm with one primal step and two dual steps 

In this section we first show that, for a given 8q > 0, there exists a point (Jr,y°) € X x R m 
such that the condition ( 124b is satisfied. Then, we propose a decomposition scheme to update 
successively a sequence {{x^ ,y k )}k>o that maintains the condition l !24b while it drives the 
sequences of smoothness parameters {ft* }k>o and {/3|}*>o to zero. 
Let us introduce the following quantities 



C d 



A\\ 2 D a + \ 



21 1/2 j 
1/2 



:= Mmax 



ft 



A T (^-fc)||, 
£ | 1 < i < m\ . 



(29) 



From ( 129b we see that the constant Q depends on the data of the problem (i.e. A, Z?x, cr, & 
and jc c ). Moreover, = ||e||. If we choose the accuracy = e > for all / = 1, . . . ,M, then 

[lEi< /2 e- 



£m = VMe and &„] 
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4.1. Finding a starting point. For a given a positive value ft > 0, let (x°,)P) be a point in 
X x M. m computed as 

V :=i*(0 ffl ;j S,), 

f :=L^^)-\A^-b), (M) 

where m 6 R" ! is the origin and i*(0 m ;ft) is denned by l[[9]l and Z/(ft) is given by i fTOl 
The following lemma shows that (i°,.y°) satisfies the (5rj-excessive gap condition ( 124b . The 
proof of this lemma is given later in Appendix lAl 

Lemma 4.1 The point (x°,y°) 6Xx K m generated by 030b satisfies the 8a-excessive gap 
condition 024b vv.^f. jSi and jt?2 provided that 

Pip2>L A , (31) 

wnere So := ft (ge^ + ifi^) > 0. 

Note that if we use jc*(0 m ;ft) instead of jt*(0'";ft) into (f30b . i.e. the exact solution 
x*(0 m ;ft) is used, then (Jc°,y°) satisfies the 0-excessive gap condition 024b . 

4.2. The inexact main iteration with one primal step and two dual steps. Let us assume 
that (x,y) is a given point in X x R m that satisfies the 5-excessive gap condition 024b w.r.t. 
ft , ft and 8. The aim is to compute a new point (x + ,y + ) such that the condition 024b holds 
for the new values ft + , j8 2 + and <5 + with ft + < ft, ft + < ft and 5+ < 5. 

First, for a given y and ft > 0, we define the following mapping 

^(y^O^argmaxjv^y^O^v-^-^^llv-yll 2 

where V v g(y;ft) is defined by 022b and L g (ft) is the Lipschitz constant. Since this maxi- 
mization problem is unconstrained and convex, we can show that the quantity G*(y;x,ft) 
can be computed explicitly as 

G* (y; ft ) := y + L« (ft ) " 1 V,.g(y ; ft ) = y + (ft ) - 1 [A? (y ; ft ) - b] . (32) 
Next, the main scheme to update (x + ,y + ) is presented as 



(i + ,y + ):=^(i,y,ft,ft,T)^ 



:(1-T)y+Ty*(^ft) 
(l-T)i+TF(y;ft) (33) 

G*(y;ft). 



Here, the smoothness parameters ft and ft and the step size T £ (0, 1) will be appropriately 
updated to obtain ft + , ftt and T + , respectively. Note that line 1 and line 3 in 033b are simply 
matrix-vector multiplications, which can be computed distributively based on the structure 
of the coupling constraints and can be expressed as 

y: = (l- T )y +T p-\Ax-b) and y+ := f+H (p^ 1 {Ax* {?, pi) -b). 

Only line 2 in 033b requires one to solve M convex primal subproblems up to a given accu- 
racy. However, this can be done in parallel. 
Let us define 

a*:=^ and a := El^lM , (34 ) 
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Then, by Assumption A 13. II we can see that 0<a*<a<l. We consider an update rale 
for /3i and p\ as 

J3+ := (1 - &z)Pi and /?+ := (1 - t)/3 2 . (35) 

In order to show that (x + ,y + ) satisfies the 5 + -excessive gap condition d24b . where 8 + will 
be defined later, we define the following function 



g. Q + (l-T)T^ + ||A||||v| 



[1]' 



(36) 



where £r CT j , Cd and La are defined in 

The next theorem provides a condition such that (x + ,y + ) generated by d33b satisfies the 
5 + -excessive gap condition d24b . For clarity of the exposition we move the proof of this 
theorem to Appendix IA1 



Theorem 4.1 Suppose that Assumptions A 12. H and A 13. H are satisfied. Let (x,y) 6 X x K m 
be a point satisfying the 8-excessive gap condition d24b w.r.t. two values jSi and JS2- Then if 
the parameter T is chosen such that T 6 (0, 1) and 

T 2 

P1P2 > U, (37) 

1 — T 

then the new point (x + ,y + ) generated by ( 133b is in X x W and maintains the 5 + -excessive 
gap condition (124b w.r.t two new values jS; + and j6 2 + defined by ( 135b , where 5+ := (1 — t)5 + 
r\(T,fi\,p\,y,£) with rj(-) defined by J36b . 

4.3. The step size update rule. Next, we show how to update the step size t 6 (0,1). 
Indeed, from Q7b we have fiifc > yr^A- By combining this inequality and Q5b we have 

/3+/3+ = (1 - t)(1 - Bct)PiP 2 > (1 - «t)t 2 L a . In order to ensure /3, + /3 2 + > t^La we 

require (1 — cct)t z > Since T, T+ e (0, l) and a € (0, 1], we have 

< t+ < 0.5t I [(1 - «t) 2 t 2 + 4(1 - otT)] 1/2 - (1 - cct)t| < T. 

Hence, if we choose t + = 0.5t [[(1 - 6:t) 2 t 2 + 4(1 — git)] 1 / 2 — (1 — o:t)t] then we obtain 
the tightest rule for updating T. Based on the above analysis, we eventually define a sequence 
{tk}k>o as follows: 

:= y { [(1 - «rt) 2 T 2 +4(1 - fikT*)] 1/2 - (1 - a t T,)T,} , Vfc > 0, (38) 

where t € (0, 1) is given and a k := px{x*{(y) k \ J3f))/D x e [a*, 1]. 

The following lemma provides the convergence rate of the sequence {ik}, whose proof 
can be found in Appendix |A1 

Lemma 4.2 Suppose that Assumption A 13. W is satisfied. Let {Tk)k>0 be a sequence gener- 
ated by 08b for a given To such that < To < [max{l, a*/(l — a*)}] -1 . Then 

<*k< , „!w , : tz ■ (39) 



fe+l/To - 0.5(1 + a*)fc+l/V 
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Moreover, the sequences {/3f }^>o and {jS^ }k>0 generated by ( 135b satisfy 



(40) 



/or a ykeif positive constant 7. 



Remark 4.1 The estimates d39b of Lemma l4.2l show that the sequence {tx} converges to zero 
with the convergence rate O(i). Consequently, by d40b . we see that the sequence {ji k ji k } 
also converges to zero with the convergence rate 0(jz). From ( 13 U and ( 1371 ). we can derive 

an initial value To := •T 1 ■ 

In order to choose the accuracy for solving the primal subproblem I ll9l >, we need to 
analyze the formula ((36}. Let us consider a sequence {r/^x) computed by 

:= tj (t*, ft*, J3f ,/,£*), 
where 77 is given in d36l >. The sequence {4}i>0 defined by 

4+1 :=(l-%)4 + % = 4 + (%-%4), V£>0, (41) 
where 8q is chosen a priori, is nonincreasing if 77 ^ < Ti4 for all k > 0. 

Lemma 4.3 accuracy E k at the iteration k of Algorithm\T\below is chosen such that 

< ef < £* := ^for i= 1,... ,M, wfere 



7 1=1 



Q + (1-t,)t, Ul + ||A||||y' 



(42) 



L A *' "VjS| 

rten rte sequence {4}/t>0 generated by d4 1 b w nonincreasing. 

Proof Since <£*<£*< 1 for all = 1, . . . ,Af, we have eL < \/Me k and (eL) 2 ^ 
(L/=i ct ') (£ k ) 2 — (X)f=l °i) 6*- B y substituting these inequalities into OBI of 77 and then 
using J42b and the notation r\k = T] (tk,Pi ,P% > y*i e *)' we nave 

1* < 

On the other hand, from (141 b we have = 4 + (^Jt — 1*4) f° r al l £ > 0. Thus, {4}/i>0 
is nonincreasing if 77^ — Tjt4 < for all A: > 0. If we choose such that Qk£ k < Tfc4> i- e - 
e*< ^,thenTjt<Tt4. □ 

From Lemma l4~3l it follows that if we choose Ek sufficiently small, then the sequence 
{{x k ,y k )} generated by (x k+1 ,y k+l ) := y d {^,y*,^ +l ,p^,T k ) maintains the 8k+i -excessive 
gap condition ( 124b with 8k+i < 4 f° r a U k. Now, by using Lemmas l3.4l and l4~Tl if we choose 
e° in Lemma 1431 such that e° := where 

c °- =ft [-TT + it*)' (43) 
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and e > is a given accuracy level, then the condition d24b holds with 8 = £. 

4.4. The algorithm and its convergence. Finally, we present the algorithm in detail and 
estimate its worst-case complexity. For simplicity of discussion, we fix the accuracy at one 
level e k for all the primal subproblems. However, we can alternatively choose different ac- 
curacy for each subproblem by slightly modifying the theory presented in this paper. 

Algorithm 4.1. (Inexact decomposition algorithm with two dual steps). 
Initialization: Perform the following steps: 

Step 1: Provide an accuracy level e > for solving <[8j and a value /3° > 0. Set To := 

0.5(V5-1), ft° :=J3° and &° := Jfr. 

Step 2: Compute Co by HUl. Set e° := e/Co and do := e. 

Step 3: Compute x° and f from (30} as x° := jc*(0 m ;j3 1 °) andf := U(jffl)- l {A& -b) 
up to the accuracy e°. 

Iteration: For k = 0, 1, 2, . . . , k mdx , perform the following steps: 

Step 7: If a given stopping criterion is satisfied then terminate. 

Step 2: Compute Q k by (|42]>- Set e k := xAjQk and update 8 k +i := (1 - Tk)S k + Qk£ k - 
Step 3: Solve the primal subproblems in ([8} in parallel up to the accuracy e k . 
Step4: Compute (x k+ \y k+1 ) := ^ d (x k ,f,^,^,T k ) by 

Step 5: Compute a k :=p x (x* (f; J3f )) /D x , where y* : = ( 1 - r k )f + x k (J3|) " 1 (Ax* - 6) . 
Step 6: Update jSf +1 := (1 - 6c k X k )Pi and /3| +I := (1 - T k )$. 

Step 7: Update T k as t*+i := 0.5t* { [(1 - a k T k ) 2 T^ + 4(1 - fifef*)] 1/2 - (1 - a^)^}. 
End. 

The stopping criterion of Algorithm [T] at Step 1 will be discussed in Section [6] The 
maximum number of iterations fc max provides a safeguard to prevent the algorithm from 
running to infinity. 

The following theorem provides the worst-case complexity estimate for Algorithm [TJ 
under Assumptions A 12. H and A 13. II 

Theorem 4.2 Suppose that Assumptions A 12. H and A 13. H are satisfied. Let {(x*,y*)} be a 
sequence generated by Algorithm\I\after k iterations. If the accuracy level £ in Algorithm^]] 
is chosen such that < £ < - g^yg" f or some positive constant cq, then the following 
primal-dual gap holds 

-R Y ^(x* +1 ) < H^ +1 )-g(f +1 ) < (/3 ° D *+ Co) , (44) 

V ) s\y [0.5(^5 -l)/fc+l]«* 

and the feasibility gap satisfies 

J?(x i+l ) = \\Ax k+l -b\\< = ^ = , (45) 

1 ' _ 0.25(V5-l)(l + a*)*+l 



whereCf- (3 - y/S)^Ry +0.5(v / 5- l)y/L A (D x +c /P ) and R Y * is defined by I 

Consequently, the sequence {(x* ,y k )} k >o generated by Algorithm\I\converges to a solu- 
tion (x*,y*) of the primal and dual problems (Q~|l-(|2]l as k — V °° and £ —> + . 
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Proof From Lemmal34l we have ^(x k+1 ) < 2(5 k+l R Y * + ^ 2fi k+l fi k+x D x + ^2p k+1 8 k+1 

and (j)(x k+l ) — < fi k+1 D x + 8k+i- Moreover, 4+1 < 8q = £ due to the choice of <5o 

and the update rule of 8k at Step 2 of Algorithm[TJ By combining these inequalities and d40fc 
and then using the definition of Cf and To = 0.5 (v5 — 1) we obtain d44b and d45b . The last 
conclusion is a direct consequence of d44t and ( 145b . □ 

The conclusions of Theorem 14.11 show that the initial accuracy of solving the primal 
subproblems $Q needs to be chosen as 0(1 /k). Then, we have \<p(x k ) — g(y k )\ = 0(l/k a ) 
and &(x k ) = 0(1 /k). Thus, if we choose the ratio a* such that a* — > 1 then we obtain an 
asymptotic convergence rate 0(1/ k) for Algorithm [TJ We note that the accuracy of solving 
^ has to be updated at each iteration k in Algorithm [TJ The new value is computed by 
£ k = xA/Qk at Step 2, which is the same 0(1 /k 2 ) order. 

Now, we consider a particular case, where we can get an 0(1 /e) worst-case complexity 
(e is a desired accuracy). 

Corollary 4.1 Suppose that the smoothness parameter jSf in Algorithm\T]is fixed at jSf = 
jS° = y/L^Ef for all k > 0. Suppose further that the accuracy level £ in Algorithm Q] is 

chosen as 0(b) and that the sequence {t^} is updated by Tk+i '■= 0.5t^ T^ +4 — Tkj 

starting from To := 0.5(\/5 — 1). Then, after k= [2/e/J + 1 iterations, one has 

< C° f S f and |0(x*) -g(f)\ < C%£ f , (46) 

where Cj := ^L^(2R Y * + y/2D~x~) and C° d := v/EImax {D X ,2R Y * + V^Dx}- 

Proof If we assume that f3 k is fixed in Algorithm[2]then, by the new update rule of {t^} we 
have = L a t| < , 4L ^L due to 03 and gO} with a* = 0. Since jSf = y/LZe f , if we 

choose := [2/e/J + 1 then - ^^ +2 < e /- Furthermore, by Lemma |3~4| we have &(£) < 

2$R Y * + yj2fl$D x < VU(2R Y * + V2D^)e f and -R Y ^(x k ) g(y*) < jSfDx = 

V^aOxE/- By combining these estimates, we obtain the conclusion d46l >. □ 

Remark 4.2 (Distributed implementation) In Algorithm [TJ only the parameter requires 
centralized information. Instead of using Cfy, we can use its lower bound a* to compute T* 
and j3 k . In this case, we can modify Algorithm[TJto obtain a distributed implementation. The 
modification is at Steps 5, 6 and 7, where we can parallelize these steps by using the same 
formulas for the all subsystems to compute the parameters jSf , f3 k and Tj. We note that the 
points x*(y;Pi) and x k+l in the scheme Q3t can be computed in parallel, while y*(x;f$i) 
and y + can be computed distributively based on the structure of the coupling constraints of 
problem 10. 

5 Inexact decomposition algorithm with switching primal-dual steps 

Since the ratio a* := jj£ defined in d34t may be small, Algorithm QJ only provides a sub- 
optimal approximation (x k ,y k ) to the optimal solution (x*,y*) such that \<j>(x*) — g(y k )\ < 
j3°D x +£ in the worst-case. For example, if we choose the prox-function px(x) '■= jTdLi \\xi~ 
jc^ 1 1 2 + a*, where a* € (0,0.5), then worst-case complexity of Algorithm [Tj is lower than 
subgradient methods, see d44b of Theorem l4.2l Algorithm QJleads to a poor performance. 
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In this section, we propose to combine the scheme defined by d33 b in this paper and 
an inexact decomposition scheme with two primal steps and one dual step to ensure that the 
parameter ft always decreases to zero. Apart from the inexactness, this variant allows one 
to update simultaneously both smoothness parameters at each iteration. 

5.1. The inexact main iteration with two primal steps. Let us consider the approximate 
function /(x;ft) = <j)(x) + )|/(x;ft) defined by H6\ , We recall that (j) is only assumed to 
be convex and possibly nonsmooth, while ft) is convex and Lipschitz continuously 
differentiable. We define 

? K^;i,jS2):=^(^0+^V(^j82)+V,,.v/(i;ft) r (^-i/) + ^y^||x i -i / || 2 , (47) 
and the mapping 

^■(x,ft) :=argnim<7,(x;;x,ft), i=l,...,M, (48) 

where /.^(ft) '■= M ^ is the Lipschitz constant of V AV ; JS2) defined in Lemma [33] 
Since g,(-;x, ft) is strongly convex, />,(x,ft) is well-defined. 

[Via \ 

Remark 5.1 Note that we can replace the quadratic term ' ^ |[x,- — x;|[ 2 in i47b by any 
Bregman distance as done in 1261 . However, the convergence analysis based on this type of 
prox-functions is more complicated than the one given in this paper. 

Suppose that we can only solve the minimization problem d48b up to a given accuracy 
e, > to obtain an approximate solution f}(-,ft) in the sense of Definition ( 13. 1 b . More 
precisely, P, (i,ft) 6 X,- and 

< ©$(*.ft);*»ft) - ©fa&ft);*. ft) < iif(&)e?. i = i,...,M. (49) 

We denote P : = (P\ , . . . , Pm ) and P : = (P\ , . . . , Pm ) ■ In particular, if is differentiable and its 
gradient is Lipschitz continuous with a Lipschitz constant L Vi > for some i £ {1,2,..., M} 
then one can replace the approximate mapping Pi by the following one: 

Gi(*,ft) :«arg^|[V^KiO+Vx,-^;J32)] r (^-^0+^^IN-^ll 2 } > 

where L,(ft) := lP< +L, v '(ft), in the sense of Definition 13.11 Note that the minimization 
problem defined in G, is a quadratic program with convex constraints. 

Now, we can present the decomposition scheme with two primal steps in the case of 
inexactness as follows. Suppose that (x,y) 6 X x R m satisfies d24b w.r.t. ft, ft and 8. We 
update (x + ,y+) eXx R m as 

(x :=(1-T)jc + Ti*(y;ft) 
(x+,y+) := ^(x,y,ft,ft+,T) o <^ y+ := (J - T)y + Ty*(*;j3+) (50) 

[x+ :=P(x,p+), 

where the step size T£ (0, 1) will be appropriately updated and 

1. the parameters ft and ft are updated by ft + := (1 — r)ft and ft^ := (1 — T)ft; 

2. x*(y; ft) is computed by l [T9l >; 

3. P(-,ft. + ) is an approximation of P(-,ft + ) defined in | |4"8| | and 1 1491 . 
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The following theorem states that the new point (x + ,y + ) updated by y p maintains the 8 + - 
excessive gap condition ( 124b . The proof of this theorem is postponed to Appendix lAl 

Theorem 5.1 Suppose that Assumptions A I2. H and A I3. W are satisfied. Let (x,y) be a point 
in X x W and satisfy the 8 -excessive gap condition ( 124b w.r.t. two values jSi and p\. Then 
if the parameter T is chosen such that T £ (0, 1) and 

jSijS 2 > (rz^j l a, (5i) 

then the new points (x + ,y + ) updated by d50b maintains the 8+-excessive gap condition ( 124 b 
W.r.t. two new values jSj + andfi^, where 8 + := (1 - t)5 + 2jSi (1 - r)D a E[ a ] + \ Y%=\ Lji.fii)^' 
and g[ CT ] and D a are defined in ( 129b . 

Finally, we note that the step size T is updated by z k+ \ := T k /(z k +l) for k > starting 
from To := 0.5 in the scheme ( 150b . see 135 ] for more details. 

5.2. The algorithm and its convergence. First, we provide an update rule for 8 in Definition 
13.21 With Bui and Da defined in l !29b . let us consider the function 

i M 

§(t,Pi, fc,e) := 2/3i(l - T)D a e [a] + - £z* (/3+)g?, (52) 

L i=i 

and a sequence {8k} generated by 8k+\ := (1 — T k )8 k + |(Ti,j3f,j3|,e*), where 8q is given 
and e k is chosen appropriately. The aim is to choose E k such that < ef < 6k and {8k} is 
nonincreasing. By letting 

/ M \ 1/2 M M 

R k :=2(l-r k )tfD a [Zoj + -^_^|>f > (53) 

Then, if we choose e k > such that e k < ^jp- then we have 8k+\ < 8 k . 

By combining both schemes 03b and d50b . we obtain a new variant of Algorithm[T]with 
a switching strategy as described as follows. 

Algorithm 5.2. [Inexact decomposition algorithm with switching primal-dual steps). 
Initialization: Perform as in Algorithm [T] with To := 0.5. 
Iteration: For k = 0,1,2, ... , fc max perform the following steps: 

Step 7: If a given stopping criterion is satisfied then terminate. 
Step 2: If k is even then perform the scheme with two primal steps: 

2.1. Compute R k by l l53l . Set e k := T k 8k/Rk and update 8 k+ \ := (1 — Tk)8k+Rk£ k - 

2.2. Update $ k+l := ( 1 - z k ) J3|. 

2.3. Compute := yP(x k ,f,$ k ,j3| +1 ,T jt ) up to the accuracy g*. 

2.4. Update jSf +1 := (1 - T*)j3f . 

2.5. Update the step-size parameter %k as ?k+i '■= ^+T- 

Step 3: Otherwise, (i.e. k is odd) perform the scheme with two dual steps: 

3.1. Compute Q k by . Set 6* := Xk8kjQk and update 8k+i := (1 - T k )8 k + Qk£ k - 

3.2. Compute (x k+1 ,y k+l ) := y d (x k ,y k ,p k ,p k ,T k ) up to the accuracy e k . 

3.3. Compute a k := px{r ^ ) \ where f := (1 - r k )f + i k {$)- 1 (Ax! 1 - b). 

3.4. Update /3f +1 := (1 - a k T k )P k and jS| +I := (1 - T*)j3f. 
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3.5. Update T k as v k+1 := f { [(1 - a k T k ) 2 tf +4(1 - a***)] 1/2 - (1 - a^)^}. 

End. 

Note that the first line and third line of the scheme ,9 >p can be parallelized. They require 
one to solve M convex subproblems of the form ([8} and ( 149) , respectively in parallel. If the 
function fa is differentiable and its gradient is Lipschitz continuous for some i £ {1,... ,M}, 
then we can use the approximate gradient mapping G, instead of P; and the correspond- 
ing minimization subproblem in the third line reduces to a quadratic program with convex 
constraints. The stopping criterion at Step 1 will be given in Section[6] 

Similar to the proof of Lemma |4~21 we can show that the sequence {t k } k> Q generated by 
Step 2.5 or Step 3.5 of Algorithm [2] satisfies estimates ( 139) . Consequently, the estimate for 

jS| in l[40]l is still valid, while the parameter jSf satisfies /3f +1 < (T k+ ^l\+ a *)i2 - 

Finally, we summarize the convergence results of Algorithm |2]in the following theorem. 

Theorem 5.2 Suppose that Assumptions A Yl.W and A 13. W are satisfied. Let {(x k ,y k )} be a 
sequence generated by Algorithm\2\after k iterations. If the accuracy level £ in Algorithm^ 
is chosen such that < £ < Q ^ for some positive constant cq, then the following primal- 
dual gap holds 

- R Y ^(x i+l ) < $ (# +1 ) - g(y k+l ) < - fgg+f° (54) 
v j-r\ (0.5k + 1)( 1 +« )/ 2 



and the feasibility gap satisfies 



^(^ +1 ) = \\A^ +1 -b\\<——^— , (55) 

v ; 11 11 - 0.25(1 + a* )Jfc+l' 



where Cf := Lk ^a* +0-5\/2La(Z3a' +cq/Pq) and Ry* is defined as in I 

Consequently, the sequence {(x* ,y k )} k >o generated by Algorithm\2\converges to a solu- 
tion (x* ,y*) of the primal and dual problems (0-(|2]l as k —> °° and £ —> + . 



The proof of this theorem is similar to Theorem 14. 2l and thus we omit the details here. 
We can see from the right hand side of (154) in Theorem 15. 2 1 that this term is better than the 
one in Theorem 14.21 Consequently, the worst case complexity of Algorithm [2] is better than 
the one of AlgorithmQ] However, as a compensation, at each even iteration, the scheme Sf p 
is performed. It requires an additional cost to compute x + at the third line of S fp '. As an 
exception, if the primal subproblem (|8} can be solved in a closed form then the cost-per- 
iteration of Algorithm |2]is almost the same as in Algorithm [T] 

Remark 5.2 Note that we can only use the inexact decomposition scheme with two primal 
steps ,y p in fl50b to build an inexact variant of [35 Algorithm 1]. Moreover, since the role 
of the schemes 5^ p and 5^ d is symmetric, we can switch them in Algorithmic 



6 Numerical tests 

In this section we compare Algorithms [T]and|2]derived in this paper with the two algorithms 
developed in [35, Algorithms 1 and 2] which we named 2pDecompAlg and pdDecompAlg, 
the proximal center-based decomposition algorithm in 0221 , an exact variant of the proximal 
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based decomposition algorithm in [4| and three parallel variants of the alternating direc- 
tion method of multipliers (with three different strategies to update the penalty parameter). 
We note that these variants are the modifications of the algorithm in 1201 . and they can be 
applied to solve problem (0 with more than two objective components (i.e. M > 2). We 
named these algorithms by PCBDM, EPBDM, ADMM-vl, ADMM-v2 and ADMM-v3, respectively. 
For more simulations and comparisons we refer to the extended technical report [?]. 

The algorithms have been implemented in C++ running on a 16 cores Intel ®Xeon 
2.7GHz workstation with 12 GB of RAM. In order to solve the general convex program- 
ming subproblems, we either used a commercial software called Cplex or an open-source 
software package IpOpt 1391 . All the algorithms have been parallelized by using OpenMP. 

In the four numerical examples below, since the feasible set X, has no specific structure, 
we chose the quadratic prox-function px t (xi) : = i | j x t — x° t \ | 2 + r,- in the four first algorithms, 
i.e. Algorithms[T]and|2] 2pDecompAlg and pdDecompAlg, where x^ 6 R"' and r, = 0.75Z)x, 
are given, for i = 1,. ..,M, as mentioned in Remark [3~T1 With this choice we can solve the 
primal subproblem (|8} in the first example by using Cplex. 

We terminated these algorithms if 

rpfgap:=||A/-fe||/max{||fc||,l} < 1(T 3 , (56) 
and either the approximate primal-dual gap satisfied 

\f(x*;$)-g(f;tf)\ < 10- 3 max{l.O,|g(/;/3f)|,|/(/;/3|)|}, 

or the value of the objective function did not significantly change in 5 successive iterations, 
i.e.: 

\i>{x*) -0(jc*-;)|/max{l.O,|0(**)|} < 10~ 3 for j= 5. (57) 

Here g(y*;j3f) is the approximate value of g(y*;/?*) evaluated ati;*^*;^). 

In ADMM-vl and ADMM-v2 we used the update formula in [3 formula (21)] to update the 
penalty parameter p^ starting from po := 1 and po := 1000, respectively. In ADMM-v3 this 
penalty parameter was fixed at p^ := 1000 for all iterations. In PCBDM, we chose the same 
prox-function as in our algorithms and the parameter /3i in the subproblems was fixed at 

j3i := £ v mix { l ®^ x )l} _ w e terminated all the remaining algorithms if the both conditions 
d56l > and d57l > were satisfied. The maximum number of iterations maxiter was set to 5000 
in all algorithms. We warm-started the Cplex and IpOpt solvers at the iteration k at the 
point given by the previous iteration k— 1 for k > 1. The accuracy levels in Cplex and 
IpOpt and 8k were updated as in Algorithms Q] and [2] starting from 3q = 10~ 3 and then 
set to max-fe^-, 10~ 10 }. In other algorithms, this accuracy level was fixed at Ek = 10~ 8 . We 
concluded that "the algorithm is failed" if either the maximum number of iterations maxiter 
was reached or the primal subproblems (|8} could not be solved by IpOpt or Cplex due to 
numerical issues. 

We benchmarked all algorithms with performance profiles [7 |. Recall that a perfor- 
mance profile is built based on a set 5? of n s algorithms (solvers) and a collection 3? 
of tip problems. Suppose that we build a profile based on computational time. We denote 
by T ps : = computational time required to solve problem p by solver s. We compare the per- 
formance of algorithm s on problem p with the best performance of any algorithm on 

T 

this problem; that is we compute the performance ratio r PlS := gggry jj ,; g .y} ■ Now, let 
p s (f ) := J-size [p e S? r p , s < f } for f eR + . The function p s : R — > [0, 1] is the prob- 
ability for solver s that a performance ratio is within a factor f of the best possible ratio. We 
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use the term "performance profile" for the distribution function p s of a performance metric. 
In the following numerical examples, we plotted the performance profiles in log 2 -scale, i.e. 
A*( T ) : = ^ size {/> e & I lo g2(^) < T:=log 2 f}. 

6.1. Basic pursuit problem. The basic pursuit problem is one of the fundamental problems 
in signal processing and compressive sensing. Mathematically, this problem can be formu- 
lated as follows: 

min |Ulh 

' " , (58) 
s.t. Ax = b, 

where A e W" x " and b e R m are given. Since 0(x) = \\x\\i = Z"=i M x i) = I"=i W> the 
primal subproblem (|8} formed from (158) in the algorithms can be expressed as 



min | \ Xi | + (Afy) Xi + ^ fa - 4') 2 } ■ 



This problem can be solved in a closed form without any subiteration. We implemented 
Algorithms [T] and [2] to solve this problem in order to compare the effect of the parameter 
a* on the performance of the algorithm. The data of this problem is generated as follows. 
Matrix A is generated randomly such that it is orthogonal. Vector b := Ax°, where x° is 
a fc-sparse random vector (k = [0.05nJ ). We tested Algorithms [T] and [2] with 5 problems 
and the results reported by these algorithms are presented in Table [T] with a* = 0.25 and 
a* = 0.75. As we can see from this table that Algorithm [2] performs better than Algorithm 



Table 1 Performance comparison of Algorithms 1 and 2 for solving ( I58t (This test was done on a MAC book 
Laptop (Intel 2.6GHz core i7, 16GB Ram). The information in rows 3, 4, 6 and 7, and columns 2-6 is the 
number of iterations / the computational time in second 



[("»,«)] 




(20,124) 


(50,128) 


(80,256) 


(100, 680) 


(100, 1054) 








a* — 


).25 






Algorithm 
Algorithm 


1 

2 


8254/1.3858 
4836/0.9025 


5090/0.8769 
4744/0.8808 


10144/2.2876 
8060/1.9809 


29773/7.8386 
13220/3.7619 


35615/10.9315 
12348/3.7967 








a* = 


).75 






Algorithm 
Algorithm 


1 

2 


7115/1.1851 
15016/2.6689 


6644/1.1632 
14284/2.6121 


8749/1.9632 
17140/4.0286 


14927/4.0424 
34048/9.6910 


16128/4.9169 
36668/11.1801 



[T]in terms of number of iterations as well as computational time for the case a* = 0.25. In 
the case a* = 0.75, Algorithm [T]performs better than Algorithm|2] This example claims the 
theoretical results. 

6.2. Nonsmooth separable convex optimization. Let us consider the following simple 
nonsmooth convex optimization problem: 

in 0(x) :=L" = i'1 x i-x?l> 

K" ' (59) 

t- £"=i*( = b, Xj eX t , i= l,...,n, 

where b, x° € K are given (i= 1 ,...,«) . Let us assume that x; 6 X,- : = [/,■ , m,] is a given interval 
in R. Then, this problem can be formulated in the form of Q~|with M = n. Since the Lagrange 
function J?(x,y) = £" =1 [i\xi — xf\ +y(x,- — b/n)\ is nonsmooth, where y 6 R is a Lagrange 
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multiplier, we choose px,(-*;) := jll-*/ - +0.75Dx,- such that the primal subproblem can 
be written as 



gi(y,pl):= min {i\x i -X?\+y(x i --) + ^\x i -x c i \ 2 +0J5D x \, 



(60) 



where /3i > 0. Now, we assume that we can choose the interval [/,-, m,] sufficiently large such 
that the constraint Xi 6 [//, m,] is inactive. Then, the solution of problem d60t can be computed 
explicitly as x*(y;j3i) := Vi(xf ,xf,y,fii,i), where the soft-thresholding-type operator Vi is 
defined as follows: 



V / (*f,*f,y,A,r): 



if4-ft (r+y)>4> 

4+jSf 1 (y-y) ifxf + /3f 1 (r-y)<^, (61) 



In this example, we tested five algorithms: Algorithm[T] Algorithm|2] 1 35 , Algorithm 1], |j35l 
Algorithm 2] and PCBDM for 10 problems with the size varying from n = 5 to n = 100,000. 
Note that if we reformulate ( 159) as a linear programming problem (LP) by introducing slack 
variables, then the resulting LP problem has 2n variables and 2n+ 1 inequality constraints. 

The data of these tests were created as follows. The value c was set to b = 2n, x" := 
(jcf, . . . ,x%) , where x' a := i — n/2. The maximum number of iterations maxiter was in- 
creased to 10,000 instead of 5,000. The performance of the five algorithms is reported in 
Table[2] Here, iters is the number of iterations and time is the CPU time in seconds. 



Table 2 Performance comparison of five algorithms for solving ( 1591 





Algorithm performance and results 


Size [n] 


5 


10 


50 


100 


500 


1,000 


5,000 


10,000 


50,000 


100,000 




2pDecompAlg 


226 


184 


704 


843 


1211 


1277 


1371 


1387 


1408 


1409 




Algorithm 1 


1216 


925 


377 


552 


1092 


1209 


1385 


1422 


1374 


1352 


iters 


Algorithm 2 


452 


334 


544 


794 


1142 


1228 


1415 


1433 


1358 


1368 




pdDecompAlg 


612 


458 


830 


887 


1253 


1341 


1451 


1428 


1487 


1446 




PCBDM 


62 


123 


507 


1036 


3767 


3693 


6119 


5816 


3099 


3285 




2pDecompAlg 


0.0143 


0.0105 


0.0339 


0.0495 


0.0809 


0.1078 


0.2969 


0.5943 


2.5055 


4.9713 




Algorithm 1 


0.0592 


0.0418 


0.0170 


0.0266 


0.0596 


0.0827 


0.2477 


0.4544 


2.1970 


4.3869 


time 


Algorithm 2 


0.0244 


0.0166 


0.0222 


0.0406 


0.0737 


0.0909 


0.3522 


0.4646 


2.0875 


4.2659 




pdDecompAlg 


0.0316 


0.0199 


0.0351 


0.0450 


0.0716 


0.0979 


0.3013 


0.4416 


2.2879 


4.3119 




PCBDM 


0.0027 


0.0036 


0.0218 


12.1021 


0.2116 


0.2232 


1.1448 


1.3084 


3.0277 


6.3322 



As we can see from Table [2] Algorithm [TJ is the best in terms of number of iterations 
and computational time. Algorithm [2] works better than pdDecompAlg. The first four algo- 
rithms have consistently outperformed PCBDM in terms of number of iterations as well as 
computational time in this example. 

6.3. Separable convex quadratic programming. Let us consider a separable convex quadratic 
program of the form: 

' min {(j)(x) := Y!t\ \A QiXi + q] '-*/} , 

' tt Lti A iXi = b, (62) 
Xj>0, i = 0,...,M. 
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Here Q, 6 K" , x "' is a symmetric positive semidefinite matrix, cji 6 W, A; 6 M. mxn * for /' = 
1,...,M and b 6 R m . In this example, we compared the above algorithms by building their 
performance profiles in terms of number of iterations and the total computational time. 

Problem generation. The input data of the test was generated as follows. Matrix Q, :=RiRf , 
where R, is an x r,- random matrix in [Iq,uq] with r, := |nj/2j. Matrix A,- was generated 
randomly in [Za,Ma]- Vector q± := — Qix®, where x° is a given feasible point in (0,r XQ ) and 
vector b := Y^Li^i^ ■ The density of both matrices A,- and Rj is Note that the problems 
generated as above are always feasible. Moreover, they are not strongly convex. The tested 
collection consisted of n p = 50 problems with different sizes and the sizes were generated 
randomly as follows: 

- Class I: 20 problems with 20 < M < 100, 50 < m < 500, 5 < n, < 100 and y A = 0.5. 

- Class 2: 20 problems with 100 < M < 1000, 100 < m < 600, 10 < n, < 50 and y A = 0. 1 . 

- Class 3: 10 problems with 1000 < M < 2000, 500 < m < 1000, 100 < < 200 and 
y A = 0.05. 

Scenarios. We considered two different scenarios: 

Scenario I: In this scenario, we aimed at comparing Algorithms [TJand|2] 2pDecompAlg, 
pdDecompAlg, ADMM-vl and EPBDM, where we generated the values of Q relatively small. 
More precisely, we chose [Iq,uq] = [—0.1,0.1], [Ia, ma] = [— 1, 1] and r l0 = 2. 
Scenario II: The second scenario aimed at testing the affect of the matrix A and the up- 
date rule of the penalty parameter to the performance of ADMM. We chose [Iq,uq] = [—1, 1], 
[l A , u A ] = [-5,5] andr VQ =5. 

Results. In the first scenario, the size of the problems satisfied 23 < M < 1992, 95 < m < 991 
and 1111 < n < 297818. The performance profiles of the six algorithms are plotted in Figure 
[TJwith respect to the number of iterations and computational time. 




Fig. 1 Performance profiles in log 2 scale for Scenario I by using IpQpt: Left-Number of iterations, Right- 
Computational time. 



From these performance profiles, we can observe that the AlgorithmQ] Algorithm|2] 
2pDecompAlg and pdDecompAlg converged for all problems. ADMM-vl was successful in 
solving 36/50 (72.00%) problems while EPBDM could only solve 9/50 (18.00%) problems. 
It shows that Algorithm [TJ is the best one in terms of number of iterations. It could solve up 
to 38/50 (76.00%) problems with the best performance. ADMM-vl solved 10/50 (20.00%) 
problems with the best performance, while this ratio was only 2/50 (4.00%) and 1/50 
(2.00%) in pdDecompAlg and Algorithm [2] respectively. If we compare the computational 
time then Algorithm [TJ is the best one. It could solve up to 43/50 (86.00%) problems with 
the best performance. ADMM-vl solved 7/50 (14.00%) problems with the best performance. 



Fast Inexact Decomposition Algorithms For Large-Scale Separable Convex Optimization 



21 



Since the performance of Algorithms [T]and [2] 2pDecompAlg, pdDecompAlg and ADMM 
are relatively comparable, we tested Algorithms [T] and [2] 2pDecompAlg, pdDecompAlg, 
ADMM-vl, ADMM-v2 and ADMM-v3 on a collection of n p = 50 problems in the second sce- 
nario. The performance profiles of these algorithms are shown in Figure[2] From these per- 




Fig. 2 Performance profiles in log 2 scale for Scenario II by using Cplex with Simplex method: Left-Number 
of iterations, Right-Computational time. 

formance profiles we can observe the following: 

- The six first algorithms were successful in solving all problems, while ADMM-v3 could 
only solve 16/50 (32%) problems. 

- Algorithm [T] and ADMM-vl is the best one in terms of number of iterations. It both 
solved 18/50 (36%) problems with the best performance. This ratio is 17/50 (34%) 
in ADMM-v2. 

- Algorithm[T]is the best one in terms of computational time. It could solve 48/50 (96%) 
the problems with the best performance, while this quantity is 2/50 (4%) in ADMM-v2. 

6.4. Nonlinear smooth separable convex programming. We consider the following non- 
linear, smooth and separable convex programming problem: 

( ( M 1 1 

min {*(*):=£- - -x?) - w, ln(l + bfxt) , 

£a , (63) 

s.t. l^AiXi = b, 

xi y 0, i = 1, . . . ,M. 

Here, g; is a positive semidefinite and x' is given vector, i = 1, . . . ,M. 

Problem generation. In this example, we generated a collection of n p = 50 test problems as 
follows. Matrix g, is diagonal and was generated randomly in [Iq,uq\. Matrix A, was gener- 
ated randomly in [Za,ma] wltn the density Vectors h; and vv,- were generated randomly in 
[lb,Ub] and [0, 1], respectively, such that w; > and Y4L1 w i = 1- Vector b := Y!iL\^i x °i f° r a 
given x° in [0, r XQ ], The size of the problems was generated randomly based on the following 
rules: 

- Class 1: 10 problems with 20 < M < 50, 50 < m< 100, 10 < «, < 50 and y A = 1.0. 

- Class 2: 10 problems with 50 < M < 250, 100 < m < 200, 20 < n,- < 50 and J A = 0.5. 

- Class 3: 10 problems with 250 < M < 1000, 100 < m < 500, 50 < n,- < 100 and y A = 0. 1. 

- Class 4: 10 problems with 1000 < M < 5000, 500 < m < 1000, 50 < «, < 100 and 
Ya = 0.05. 
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- Class 5: 10 problems with 5000 < M < 10000, 500 < m < 1000, 50 < n t < 100 and 
y A = 0.01. 

Scenarios. We also considered two different scenarios as in the previous example: 
Scenario I: Similar to the previous example, with this scenario, we aimed at comparing 
Algorithms [T] and |2] 2pDecompAlg, pdDecompAlg, ADMM-vl, PCBDM and EPBDM. In this 
scenario, we chose: [Iq,uq] = [—0.01,0.01], [lb,Ub] = [0j 100], [/a,«a] = [—1, 1] and r XQ = 1. 
Scenario II: In this scenario, we only tested two first variants of ADMM and compared them 
with the four first algorithms. Here, we chose [Iq,uq] = [0.0,0.0] (i.e. without quadratic 
term), [/£,«/,] = [0, 100], [Ia,ua] = [—1, 1] and r XQ = 10. 

Results. For Scenario I, we see that the size of the problems is in 20 < M < 9938, 50 < m < 
999 and 695 < n < 741646. The performance profiles of the algorithms are plotted in Figure 
[3] The results on this collection shows that Algorithm [T]is the best one in terms of number 



Total number ot iteration 



Total computational time 




PCBDM 

EPBDM 



Fig. 3 Performance profiles on Scenario II in log 2 scale by using IpOpt: Left-Number of iterations, 
Right-Computational time. 

of iterations. It could solve up to 41 /50 (82%) problems with the best performance, while 
ADMM-vl solved 10/50 (20%) problems with the best performance. Algorithm [T]is also the 
best one in terms of computational time. It could solve 50/50 (100%) problems with the 
best performance. PCBDM was very slow compared to the rest in this scenario. 

For Scenario II, the size of the problems was varying in 20 < M < 9200, 50 < m < 946 
and 695 < n < 684468. The performance profiles of the tested algorithms are plotted in 
Figure [4] We can see from these performance profiles that Algorithm [T] is the best one in 




Fig. 4 Performance profiles in log 2 scale for Scenario I by using IpOpt: Left-Number of iterations, Right- 
Computational time. 

terms of number of iterations. It could solve up to 30/50 (60%) problems with the best per- 
formance, while this number were 3/50 (6%) and 20/50 (40%) problems in 2pDecompAlg 
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and ADMM-vl, respectively. Algorithm [T] was also the best one in terms of computational 
time. It solved all problems with the best performance. ADMM-v2 was slow compared to the 
rest in this scenario. 

From the above two numerical tests, we can observe that Algorithm [T] performs well 
compared to the rest in terms of computational time due to its low cost per iteration. ADMM 
encounters some difficulty regarding the choice of the penalty parameter as well as the 
effect of matrix A. Theoretically, PCBDM has the same worst-case complexity bound as 
Algorithms [T]and ©. However, its performance is quite poor. This happens due to the choice 
of the Lipschitz constant L\ of the gradient of the dual function and the evaluation of the 
quantity Dx ■ 



7 Concluding remarks 

We have proposed a new decomposition algorithm based on the dual decomposition and ex- 
cessive gap techniques. The new algorithm requires to perform only one primal step which 
can be parallelized efficiently, and two dual steps. Consequently, the computational com- 
plexity of this algorithm is very similar to other dual based decomposition algorithms from 
the literature, but with a better theoretical rate of convergence. Moreover, the algorithm auto- 
matically updates both smoothness parameters at each iteration. We notice that the dual steps 
are only matrix-vector multiplications, which can be done efficiently with a low computa- 
tional cost in practice. Furthermore, we allow one to solve the primal convex subproblem of 
each component up to a given accuracy, which is always the case in any practical implemen- 
tation. An inexact switching variant of Algorithm Q]has also been presented. Apart from the 
inexactness, this variant allows one to simultaneously update both smoothness parameters 
instead of switching them. Moreover, it improves the disadvantage of Algorithm[T]when the 
constant a* in Theorem 14. H is relatively small, though it did not outperform Algorithm [T]in 
the numerical tests. The worst-case complexity of both new algorithms is at most 0(1/ e) for 
a given tolerance e > 0. Preliminary numerical tests show that both algorithms outperforms 
other related existing algorithms from the literature. 
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A The details of the proofs 

In this appendix we provide the full proof of Theorem 14. II Theorem l5.ll Lemmaf^TJand Lemma l4~2l 
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A.l. The proof of Theorem \4. 1\ Let us denote by y 2 := y*(x; ft), x 1 :=x* (v; ji\ ) and i 1 = x*(y; ft ). From the 
definition of /, the second line of )33t and (35). we have 

; ft+ ) := <t> (x + ) + ¥ (x+ ; p+ ) li "= 1S1 <?. ( ( 1 - T)x+ tx l ) + max ( [A ( ( 1 - %)x +Z x 1 )-b] T y- ^-\\y\\ 2 \ 
! {(1 - t) [<l,(x) + (Ax-b) T y- & llvll 2 ] w + T [</>(*' ) + (Ai» -b) T y] [2] }. (64) 



p — convex + 

max • 

vfcR" 1 



Now, we estimate two terms in the last line of d64t . First we note that a T y — ^ \\y\\ 2 — \\a\\ 2 — § ||y — jj<z|| 2 
for any vectors a and y and j3 > 0. Moreover, since y 2 is the solution of the strongly concave maximization 
1141 with a concavity parameter ft, we can estimate 

[■]„] := $(x) + (Ax-bfy- y ||y|| 2 = ^ X ) + ^\\Ax-bf ~^\\y-ff 

= ftf + V (x;h) - y \\y-f II 2 ^ /(*,fc) - y lly -y 2 II 2 (65) 

f24l ft, )- concave ft, 

<^ «(y; ft ) - y lly - y 2 II 2 + 5 < g(y;ft ) + V y s(y; ft ) r (y - y) - y ||y - y 2 1| 2 + 5 
fg(y;ft)+V ) .g(y;ft) r (y-y)-§||y-y 2 || 2 +(y-y) r A(x'-f 1 )+5. 

Alternatively, by using i29) , the second term [•]»] can be estimated as 

Hp, :=0(.r 1 ) + (A.r 1 -fo) r y 

= (JE 1 ) + (AS 1 - b) T y + ft p x (x 1 ) + (Ax 1 - fo) r (y - y) - ft p x (X 1 ) 

f 1 0(x') + (Ax 1 -fo) r y+ft / , x (x 1 ) + (Ax I -fo) 7 '(y-y)-ft W (x 1 ) + ^ e 2 (T] 

1+U ^^;ft)+v,^;ft) r (y-«-ft W (* 1 )+| e f CT] . 

Next, we consider the point « := y + r(y — y) with T G (0,1). On the one hand, it is easy to see that if y £ ] 
then u 6 K" . Moreover, we have 

( 1 - T ) (y - y ) + T (y - y ) = y + T (y - y ) - y = u - y , 
u — y — « — (1 — T)y— Ty 2 = T(y — y 2 ). 



(66) 



(67) 



On the other hand, it follows from J37t that 

(1 — T) La Lemmnim 

^y^ft>^ > i*(ft), i=l,...,M. (68) 
By substituting )65t and )66t into 1641 and then using 1671 and {68), we conclude that 

f(x + ;p+) <max{(l-T)[-][i]+f[-]pj} 

. { ( 1 - T)g(y; ft ) + Tg(y; ft ) + V v g(j>; ft ) r [( 1 - t) (y - y) + r(y - y)] 

- (1 ' 2 T)fe lb - f II 2 } - ^ft Px (i 1 ) + O.Srft ef a] + ( 1 - r) 5 + ( 1 - t) (y - # A (*' - i 1 ) 

max L(y;ft ) + V v g(y;ft ) r (« - y) - ||« -y|| 2 } 

ueM™ |_ 2t- J J pi 

+ [o.Stft ef a] + ( 1 - T) 5 + ( 1 - t) (y - y) r A(x' - i 1 ) - rft Px (•?' )] . (69) 



< max < 
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Let us consider the first term [ ][ 3 ] of i69\ . We see that 



[■] [3] = max \g(y;lii) + Vyg(,m) T (u-9)- ( * „ j ft \\u-9f 



BEE™ 



2t 2 

£ s (ft) 



h-n 2 } 



= g(y;pi) + V y g(J;j3i) (y -y) — ||y -y|| 



b;ft)+v,g(j>; ft ) r (y + -y) - >1I 2 + (y + -y) r A(x' ) 



(70) 



cm 



< g(y + ;ft) + (.v + -.y)'A(^ 1 -i 1 ) 
f g(y + ;ft + ) + (ft -ft + K(^(y + ;ft + )) + (y + -y) r A(x 1 -*') 
m < m g(y + ;ft + ) + [tofrDx + (t-y) T A(x l -x 1 )] . 



In order to estimate the term [-]| 4 ] + we can observe that 



(y + -y)-(i-*)(y-y) 



f33llin e I 



Li(h)- l (M l -b) + (l-T)T(f-y) 



= L g (j3 l )- l A(x l -x c )+U(fii)- i {Ax c -b) - (1 - T)ry 
+ ftT 1 ( 1 - t) tA (x - x c ) + j%" 1 ( 1 - T ) T (Ax c - i) , 



which leads to 



A T [(^-«-(l-T)(y-y)] <L A 1 ft||A|| 2 ||x 1 -.v c ||+L A I ft||A r (A.^-fo)|| + 
+ ft" 1 (1 - T)T||A r (Ax c + (1 - T)T||A||||y||. 



(l-f)T 
ft 



l|A|| 2 ||x-^|l 



(71) 



Note that similar to d85t . we have H* 1 — x c \\ < D a and \\x — < D a . By substituting these estimates into 
\ and using the definitions of [-Jui and [-Jpi we have 

[•][4] + [-][5] < (1 - + + *ft (SDx - P*(*')) + f^Q + (1 - t)T (jg + ||A||||y||)] (72) 

By combining {69), )70t and 1721 and noting that aDx — Px (x ) < 0, we obtain 

f(x+;p+) Kg^-.P^+^iaDx-pxix^ + il-^S + rjiz^ukJ,^ 

<«(f f ;A + ) + (i-T)«+i?(*,ft,ft,y,e) 
= s(y + ;ft + ) + <5 + , 

which is indeed inequality )24t w.r.t. ft + , ft*" and 5 + . □ 

A.2. The proof of Theorem 1X71 Let us denote by y^ = y*(i;ft + ), x 1 := Jt*(y;ft), x 1 := i* (y;ft), 5* + := 
P(.v;ft + ) and — Jc'll^ := CT,- — x]\\ 2 . From the definition of g( ;ft), the second line of 15 U and 
ft + = (1 - f)ft we have 

g(y + ; ft+) = min {<j>(x) + (y+ ) r (Ax -b) + ft Px (x) } 

linejlTil 



min 

.vex 



{ (1 - t) [</> (x) + f (Ax -b) + p lP x(x)] [l] + T[<p (x) + (yl) T (Ax -b)\ 2] ). (73) 



First, we estimate the term in )73t . Since each component of the function in []||| is strongly convex w.r.t. 
Xj with a convexity parameter ft o", > for i = 1, . . . ,M, by using the optimality condition, one can show that 



ID 

> min 



113 01 , n 

> f(x ; fc) + f\\ x -x l \\l-S. 



{Hx) + f(Ax-b) + p l px(x)} + ^\\x-x [ \\l^g(y-p 1 ) + ^-\\x-x [ \\ 2 a 



Pi 

2 



(74) 
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Moreover, since <fr(x;fh) = ^\\Ax—b\\ 2 = ^^-\\Ax- b\\ 2 = (1 - T)y(x;/3 2 + ), by substituting this relation 



into il5i we obtain 



[■][!] >0« + V .(.f;ft) + ^||x-.r 1 || 2 a -5 

= 0(x) + V (x;p+) - W(x;ft) + §- \\x-x 1 \\i-8 (75) 
d >"'0« + v/(-v 2 ;ft + )+V,v/(^;ft + ) r (^ 2 ) + ^ \\x-x l \\ 2 a -S + ^\\A(x-x 2 )\\ 2 - W (x;p+). 

Here, the last inequality follows from the fact that y/(.v;j3, + ) = — j^HAx — b\\ 2 . 

Next, we consider the term [■] m of i73\ . We note that = i (A* 2 — fo). Hence, 

Pi 

[■] [2] = *(x) + (v 2 + ) r A(x- a 2 ) + (A* 2 

L= n ™ a El 0W + ViV/(v 2. ft+) r (jt _ x 2 )+ 1 ||Av 2 _ fo|| 2 (76) 

f>2 

= (.r) + Wix 2 ; jS+ ) + V, v/(.v 2 ; 0+ f (x - x 2 ) + V (x 2 ; jl+ ) . 

From the definitions of || ■ \\ a , D a and £i a \ we have IIjc-^Uct <D a , \\£ -x°\\ a <D a and -x l \\ a < & a] . 
Moreover, \\x — x l \\ a > \ \\x — ,t' \ \ a — \\x* — .f 1 || CT |. By using these estimates, we can derive 

\\x~Al >[||*-.v 1 ||„-||.v 1 -.v l || a ] 2 

= \\x-x l \\ 2 a -IWx-x'U^ -x [ \\ a + \\x l -x'\\l 

> 1^-/111-211^ -x% 01*-*%+ p 1 

Furthermore, the condition | |51I can be expressed as 



(77) 



>\\x-x [ \\l -4D a e la] . 



. Af||A ; || : 



^>J^=L7m^...,M. (78) 



By substituting (75), (76) and (77) into {73} and then using (78) and note that r(x- x 1 ) = (1 - t)j+tx- j 2 , 
we obtain 

g(y + ;/3 + ) =min{(l-T)[-] |1|+ T[-] |2] } 



^>^min{(l-T)0(.f) + T0(.v)+Vv/(.v 2 ;ft + ) r [(1-t)(x-* 2 )+t(.v-* 2 )] + ||.t-.v'|| 2 } 



-(1-t)5 + 



w(x 2 ■ ft+ ) - ( 1 - 1) tv(* ; ft + ) + ^p- II a(* - * 2 ) II 2 



(79) 



We further estimate j791 as follows 

— convex 



91— convex *- 

V + ;j3, + ) > mm\ip((\-T)x+Tx)+S7\fr(x-,Pf)((l-z)x+Tx-x 2 ) 
+ (l-^fr {]x _ (| 2 j + _ (1 _ T)5 _ 2(1 _ T)jSlDffe[ff] 

f min {0(») + y(^;ft + )+V^ 2 ;fe + )(H-x 2 )+ (1 : t 2 )/3l ||»-x 2 || 2 | 



(l-T)x+«exi- 
-2(l-T)ftZ) <J e w -(l--r)5 + [-] [3] 

Ln{f(iO+y(iift + )+V^ 

= ? (^ + ,y 2 ;j3+) -2(1 - z)^D a e [a] - (1 - t)5 + [-] [3] 

_ q(x+,yl;P+) - 2(1 - T)jStD CT e [CT] - (1 - t)« + - 0.5e 2 

f/(f + ;ft + ) -2(1 - T)ft /)„£[„] - (1 - T)5+[-] [3] -0.5e A 2 , (80) 
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where e A := [E^if "(j3+ )e, 2 ] '/ 2 . 

To complete the proof, we estimate [•]r 3 i as follows 

[•] [3] = TV/(.v 2 ;ft+) - 1(1 - TM*ft + ) + ^?II A ^-^H 2 

= M A * 2 - b f- *U - *)U*-b\\ 2 + (1 - T)||A(i -,v 2 )|| 2 ] (81) 
= ^\\(Ax 2 -b) + (l-t)(Ax-b)f>0. 

By substituting 48 1 1 into )80t and using the definition of S + in 452t we obtain 

g(.v + ;ft + )>/(f + ;ft + )-5 + , 

where S + := (1 - T)S + 2pi (1 - t)D a e [a] +0.5lJ£ 1 L, v '( J 8 2 + )e, 2 = (l-T)« + §(T,ft,ft,e). This is indeed 
{24) with the inexactness <5 + . □ 

A. 3. 77ie proof of Lemma \4J\ For simplicity of notation, we denote by x* := x* ({)"', fi\ ) andi* := x*(0"';j}i ), 
/j(-;3i,j3i ) := /!,(■;>', j3i ), where /i, is defined in Definition |3~T1 By using the inexactness in inequality (20} 
and y c = 0"', we have h(x*;y,P\ ) < h(x*;y,ji[ ) + Ifte 2 ^ which is rewritten as 

4 Of )+p lP x(T) < (** ) + ft J* (** ) - *(0"' ;ft ) + y efa] ■ ( 82 ) 

Since g(-;/j| ) is concave, by using the estimate 1121 and V y g(0 m ; j3i ) = Ax* — b we have 

g(/;ft)>g(0"«;j3 1 ) + V,g({r;ft) r y , - : ^f^liy , || 2 
= g ((r;jS 1 ) + (Ac*-fe) I "y ) -^liy ) || 2 

f + ft/* (*) + (Ax* - bff - W. \\ff - | e f CT] (83) 
= + (AT - bf? - ^il||v-°|| 2 + (f) T A(x* -r) + p lPx (x*) - ^-ef a] := 7b. 
Since \\x* — Jc* 1 1 < £mi, px(x*) > p x > and;y° is the solution of U4t . we estimate the last term To of (83) as 
To > + max {(Ax* -bfy- ^-^||v|| 2 } - \\A T ?\\\\x* -St\\ - |ef CT] 



ED+EU 



> N ) + max {(AT -*) r y- f ||y|| 2 }- l|AYl|e[i] - f ef ff] 



(84) 



l|A y ll e [i] + y e [ CT ] 



Now, we see that p' x + f | |S? - 1 1 2 < p Xj (x° ) < sup p Xj (*,•) = D Xi . Thus, 1 1*? - jcf 1 1 2 < £ (D X) . - pi ) < ^ 
for all ! = 1, . .. ,M. By using the definition of D a in (29), the last inequalities imply 

\\x°-x c \\<D a . (85) 

Finally, we note that A r v° = IsTJjTJ A r (Ax° — fo) due to (30). This relation leads to 

\\A T f\\ =L^P,)- , \\A T (Ax -b)\\=L^p t y l \\A T (A(x°-x l -)+Ax c -b)\\ 

<LS(ft)-» [||A r A||||.v°-.r c || + ||A r (A.v r -fo)||] T^'ft [||A|| 2 Ar + \\A T (Axf - b)\\] (86) 



2S 



Quoc Tran-Dinh et al. 



By substituting (85) and (86) into (84) and then using the definition of So we obtain the conclusion of the 
lemma. □ 

A.4. The vroof of Lemma K51 Let us consider the function t,(t;a) := . -, 2 — — , where a 6 [0,11 

and t > 2. After few simple calculations, we can estimate f + a < \/t 3 j(t — 2a) + 1 < t + 1 for all t > 
2max{l,a/(l — a)}. These estimates lead to 

2(f + 2) _1 <§(t;B)<2(» + l + a) -1 ,Vf > 2max{l,0f/(l- a)}. 

From the update rale (38) we can show that the sequence {tk}k>0 satisfies := ^(2/T t ;a^). If we 
define := then = §(Tfc;ajt). Therefore, one can estimate + 1 + Ctj < tk+\ < ft + 2 for ft > 
2max{l,Cft/(l — Cft)}. Note that Cft > a* by Assumption A I3. H and by induction we can show that to + 
(1 + a*)k < ft < to + 2k for k > and to > 2max{l,a*/(l - a*)}. However, since % = 2-, this leads to: 

11 1 1 



fc+l/lb k + t /2- "-0.5(1 + a*)/t + « /2 0.5(1 + Of* )k+ 1/To ' 

which is indeed (39). Here, < To = 2/fo and To < [max{l,a*/(l - a*)}] -1 . 

In order to prove (40), we note that (1 — OftTj.)(l — Tk+i) = By induction, we have IT/LoO — 

T- 

O^T^-i) n?=o(l — T t) = T ?^ .By combining this relation and the update rale (35), we deduce that ft;8* +1 = 
T o 

fiOjjO ill^li, which is me third statement of (40). 

T o 

Next, we prove the bound on ft*. Since ft* +1 = /3f n?=o ( 1 - a i z >)' we have i^nLoO - T i) < ft*" 1 "' < 
$\[ k j=(l (\- a*Tj). By using the following elementary inequalities -t-t 2 < ln(l -t) < -t for all f e [0, 1/2], 
we obtain ft fT 5 !^ <ft* +I < j3 1 °e" a * s i , where Si := TLc T,- andS? := F.Ln if. From (39). on the one hand, 
we have 

* 1 * 1 

£/+TAo " Sl " S 0.5(1 + «*)/+ 1/to' 

which leads to ln(i + 1/xb) + mT o < Si < 5(i+a*) ' n (^ + V T o) + 7o for some constant Jo- On the other 
hand, we have Si converging to some constant y% > 0. Combining all estimates together we eventually get 
^Fy ft < t — 1 . for some positive constant y. 



Finally, we estimate the bound on ft*. Indeed, it follows from (39) that ft* +1 = ft nLoO _ T *) ^ 
Pill^ol 1 k+\/to > ~ Pi *+l/T _ tb*+l ' 
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